AITopics | augmented mdp

Collaborating Authors

augmented mdp

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

How to Solve Contextual Goal-Oriented Problems with Offline Datasets?

Neural Information Processing SystemsFeb-17-2026, 14:54:12 GMT

We conduct a novel theoretical analysis to demonstrate CODA's

machine learning, natural language, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin > Dane County > Madison (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language (0.67)

Add feedback

DAC: The Double Actor-Critic Architecture for Learning Options

Shangtong Zhang, Shimon Whiteson

Neural Information Processing SystemsFeb-12-2026, 03:26:23 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, master policy, policy optimization algorithm, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)
Information Technology > Artificial Intelligence > Systems & Languages > Problem-Specific Architectures (0.40)

Add feedback

DAC: The Double Actor-Critic Architecture for Learning Options

Neural Information Processing SystemsDec-25-2025, 08:52:12 GMT

Under this novel formulation, all policy optimization algorithms can be used off the shelf to learn intra-option policies, option termination conditions, and a master policy over options. We apply an actor-critic algorithm on each augmented MDP, yielding the Double Actor-Critic (DAC) architecture. Furthermore, we show that, when state-value functions are used as critics, one critic can be expressed in terms of the other, and hence only one critic is necessary. We conduct an empirical study on challenging robot simulation tasks. In a transfer learning setting, DAC outperforms both its hierarchy-free counterpart and previous gradient-based option learning algorithms.

double actor-critic architecture, learning option, name change, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

How to Solve Contextual Goal-Oriented Problems with Offline Datasets?

Neural Information Processing SystemsOct-10-2025, 13:56:13 GMT

We conduct a novel theoretical analysis to demonstrate CODA's

augmented mdp, dataset, dyn, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin > Dane County > Madison (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language (0.67)

Add feedback

DAC: The Double Actor-Critic Architecture for Learning Options

Shangtong Zhang, Shimon Whiteson

Neural Information Processing SystemsOct-2-2025, 17:03:26 GMT

Temporal abstraction (i.e., hierarchy) is a key component in reinforcement learning (RL).

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)
Information Technology > Artificial Intelligence > Systems & Languages > Problem-Specific Architectures (0.40)

Add feedback

Reviews: Explicit Planning for Efficient Exploration in Reinforcement Learning

Neural Information Processing SystemsJan-23-2025, 10:44:34 GMT

This paper introduces the interesting idea of demand matrices to more efficiently do pure exploration. Demand matrices simply specific the minimum number of times needed to visit every state-action pair. This is then treated as an additional part of the state in an augmented MDP, which can then be solved to derive the optimal exploration strategy to achieve the specified initial demand. While the idea is interesting and solid, there are downsides to the idea itself and some of the analysis in this paper that could be improved upon. There are no theoretical guarantees that using this algorithm with a learned model at the same time will work.

artificial intelligence, demand matrix, machine learning, (15 more...)

Neural Information Processing Systems

Industry: Energy > Oil & Gas > Upstream (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

DAC: The Double Actor-Critic Architecture for Learning Options

Neural Information Processing SystemsOct-10-2024, 00:21:43 GMT

algorithm, double actor-critic architecture, learning option, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Systems & Languages > Problem-Specific Architectures (0.40)

Add feedback

How to Solve Contextual Goal-Oriented Problems with Offline Datasets?

Fan, Ying, Li, Jingling, Swaminathan, Adith, Modi, Aditya, Cheng, Ching-An

arXiv.org Artificial IntelligenceAug-14-2024

We present a novel method, Contextual goal-Oriented Data Augmentation (CODA), which uses commonly available unlabeled trajectories and context-goal pairs to solve Contextual Goal-Oriented (CGO) problems. By carefully constructing an action-augmented MDP that is equivalent to the original MDP, CODA creates a fully labeled transition dataset under training contexts without additional approximation error. We conduct a novel theoretical analysis to demonstrate CODA's capability to solve CGO problems in the offline data setup. Empirical results also showcase the effectiveness of CODA, which outperforms other baseline methods across various context-goal relationships of CGO problem. This approach offers a promising direction to solving CGO problems using offline datasets.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2408.07753

Country: North America > United States > Wisconsin > Dane County > Madison (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Add feedback

Provable Risk-Sensitive Distributional Reinforcement Learning with General Function Approximation

Chen, Yu, Zhang, Xiangcheng, Wang, Siwei, Huang, Longbo

arXiv.org Artificial IntelligenceFeb-28-2024

Reinforcement learning (RL) [43] has emerged as a powerful framework for sequential decision-making in dynamic and uncertain environments. While traditional RL methods, predominantly focused on maximizing the expected return, have seen significant advancements through approaches such as Q-learning [37, 25] and policy gradients [28, 10], they often fall short in real-world scenarios demanding strict risk control, such as financial investment [9], medical treatment [16], and automous driving [11]. The significance of comprehending risk management in RL has led to the emergence of Risk-Sensitive RL (RSRL). Unlike risk-neutral RL, which primarily focuses on maximizing expected returns, RSRL seeks to optimize risk metrics, such as entropy risk measures (ERM) [17, 18] or conditional value-at-risk (CVaR) [46], of the possible cumulative reward which emphasizes its distributional characteristics. However, traditional RL framework based on Q-learning which typically considers the mean of reward-to-go and corresponding Bellman equation, cannot efficiently capture the characteristics of the cumulative reward's distribution. Therefore, there has been an upsurge of interest in Distributional RL (DisRL) due to its capacity to understand the intrinsic distributional attributes of cumulative rewards, which has already achieved significant empirical success in risk-sensitive tasks [8, 14, 30, 45, 34].

function approximation, probability, risk measure, (12 more...)

arXiv.org Artificial Intelligence

2402.18159

Country:

Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.87)
Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.42)

Add feedback

Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR

Wang, Kaiwen, Kallus, Nathan, Sun, Wen

arXiv.org Artificial IntelligenceMay-24-2023

In this paper, we study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance $\tau$. Starting with multi-arm bandits (MABs), we show the minimax CVaR regret rate is $\Omega(\sqrt{\tau^{-1}AK})$, where $A$ is the number of actions and $K$ is the number of episodes, and that it is achieved by an Upper Confidence Bound algorithm with a novel Bernstein bonus. For online RL in tabular Markov Decision Processes (MDPs), we show a minimax regret lower bound of $\Omega(\sqrt{\tau^{-1}SAK})$ (with normalized cumulative rewards), where $S$ is the number of states, and we propose a novel bonus-driven Value Iteration procedure. We show that our algorithm achieves the optimal regret of $\widetilde O(\sqrt{\tau^{-1}SAK})$ under a continuity assumption and in general attains a near-optimal regret of $\widetilde O(\tau^{-1}\sqrt{SAK})$, which is minimax-optimal for constant $\tau$. This improves on the best available bounds. By discretizing rewards appropriately, our algorithms are computationally efficient.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2302.03201

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)

Add feedback